The xView2 competition and xBD dataset spurred significant advancements in overhead building damage detection, but the competition's pixel level scoring can lead to reduced solution performance in areas with tight clusters of buildings or uninformative context. We seek to advance automatic building damage assessment for disaster relief by proposing an auxiliary challenge to the original xView2 competition. This new challenge involves a new dataset and metrics indicating solution performance when damage is more local and limited than in xBD. Our challenge measures a network's ability to identify individual buildings and their damage level without excessive reliance on the buildings' surroundings. Methods that succeed on this challenge will provide more fine-grained, precise damage information than original xView2 solutions. The best-performing xView2 networks' performances dropped noticeably in our new limited/local damage detection task. The common causes of failure observed are that (1) building objects and their classifications are not separated well, and (2) when they are, the classification is strongly biased by surrounding buildings and other damage context. Thus, we release our augmented version of the dataset with additional object-level scoring metrics https://gitlab.kitware.com/dennis.melamed/xfbd to test independence and separability of building objects, alongside the pixel-level performance metrics of the original competition. We also experiment with new baseline models which improve independence and separability of building damage predictions. Our results indicate that building damage detection is not a fully-solved problem, and we invite others to use and build on our dataset augmentations and metrics.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
There are many potential benefits to news readers accessing diverse sources. Modern news aggregators do the hard work of organizing the news, offering readers a plethora of source options, but choosing which source to read remains challenging. We propose a new framework to assist readers in identifying source differences and gaining an understanding of news coverage diversity. The framework is based on the generation of Discord Questions: questions with a diverse answer pool, explicitly illustrating source differences. To assemble a prototype of the framework, we focus on two components: (1) discord question generation, the task of generating questions answered differently by sources, for which we propose an automatic scoring method, and create a model that improves performance from current question generation (QG) methods by 5%, (2) answer consolidation, the task of grouping answers to a question that are semantically similar, for which we collect data and repurpose a method that achieves 81% balanced accuracy on our realistic test set. We illustrate the framework's feasibility through a prototype interface. Even though model performance at discord QG still lags human performance by more than 15%, generated questions are judged to be more interesting than factoid questions and can reveal differences in the level of detail, sentiment, and reasoning of sources in news coverage.
translated by 谷歌翻译
Language models (LMs) now excel at many tasks such as few-shot learning, question answering, reasoning, and dialog. However, they sometimes generate unsupported or misleading content. A user cannot easily determine whether their outputs are trustworthy or not, because most LMs do not have any built-in mechanism for attribution to external evidence. To enable attribution while still preserving all the powerful advantages of recent generation models, we propose RARR (Retrofit Attribution using Research and Revision), a system that 1) automatically finds attribution for the output of any text generation model and 2) post-edits the output to fix unsupported content while preserving the original output as much as possible. When applied to the output of several state-of-the-art LMs on a diverse set of generation tasks, we find that RARR significantly improves attribution while otherwise preserving the original input to a much greater degree than previously explored edit models. Furthermore, the implementation of RARR requires only a handful of training examples, a large language model, and standard web search.
translated by 谷歌翻译
We present a retrospective on the state of Embodied AI research. Our analysis focuses on 13 challenges presented at the Embodied AI Workshop at CVPR. These challenges are grouped into three themes: (1) visual navigation, (2) rearrangement, and (3) embodied vision-and-language. We discuss the dominant datasets within each theme, evaluation metrics for the challenges, and the performance of state-of-the-art models. We highlight commonalities between top approaches to the challenges and identify potential future directions for Embodied AI research.
translated by 谷歌翻译
在本文中,创建了具有定制设计的执行器空间弦编码器的增强软机器人原型,以研究动态软机器人轨迹跟踪。软机器人原型嵌入了所提出的自适应被动性控制和有效的动态模型,使具有挑战性的轨迹跟踪任务成为可能。我们通过在不同的操作场景上执行实验验证:各种跟踪速度和外部干扰来探索跟踪准确性以及提出的控制策略的全部潜力。在所有实验场景中,提出的自适应被动控制都优于常规PD反馈线性化控制。实验分析详细介绍了所提出的方法的优势和缺点,并指出了未来软机器人动态控制的下一步。
translated by 谷歌翻译
这项工作提出了一种有丝分裂检测方法,只有一个香草卷积神经网络(CNN)。我们的方法由两个步骤组成:给定图像,我们首先使用滑动窗口技术应用CNN来提取具有有丝分裂的斑块。然后,我们计算每个提取的斑块的类激活图,以获得有丝分裂的精确位置。为了提高模型的推广性,我们使用一系列数据增强技术训练CNN,与噪声标记的图像相抵制的损失以及主动的学习策略。我们的方法在MIDOG 2022挑战的初步测试阶段中,通过有效网络B3模型获得了0.7323的F1得分。
translated by 谷歌翻译
透明对象对视觉感知系统提出了多个不同的挑战。首先,他们缺乏区分视觉特征使透明对象比不透明的对象更难检测和本地化。即使人类也发现某些透明的表面几乎没有镜面反射或折射,例如玻璃门,难以感知。第二个挑战是,通常用于不透明对象感知的常见深度传感器由于其独特的反射特性而无法对透明对象进行准确的深度测量。由于这些挑战,我们观察到,同一类别(例如杯子)内的透明对象实例看起来与彼此相似,而不是同一类别的普通不透明对象。鉴于此观察结果,本文着手探讨类别级透明对象姿势估计的可能性,而不是实例级姿势估计。我们提出了TransNet,这是一种两阶段的管道,该管道学会使用局部深度完成和表面正常估计来估计类别级别的透明对象姿势。在最近的大规模透明对象数据集中,根据姿势估计精度评估了TransNet,并将其与最先进的类别级别姿势估计方法进行了比较。该比较的结果表明,TransNet可以提高透明对象的姿势估计准确性,并从随附的消融研究中提高了关键发现,这表明未来的方向改善了绩效。
translated by 谷歌翻译
生成对抗网络(GAN)在许多应用领域中广泛采用,例如数据预处理,图像编辑和创造力支持。但是,GAN的“黑匣子”性质可防止非专家用户控制模型生成的数据,并产生大量的先前工作,该工作集中在算法驱动的方法上,以提取编辑说明以控制GAN。补充,我们提出了一个Ganzilla:用户驱动的工具,该工具使用户能够使用经典的散点/收集技术来迭代地发现指示,以实现其编辑目标。在与12名参与者的一项研究中,Ganzilla用户能够发现(i)编辑图像匹配提供的示例(封闭任务)的说明,并且(ii)遇到了一个高级目标,例如使脸更加快乐,而同时又实现了。显示个人之间的多样性(开放式任务)。
translated by 谷歌翻译
脑出血(ICH)是最致命的中风子类型,死亡率高达52%。由于颅骨切开术引起的潜在皮质破坏,保守管理(注意等待)历史上一直是一种常见的治疗方法。最小的侵入性疏散最近已成为一种可公认的治疗方法,用于体积30-50 mL的深座性血肿的患者,但适当的可视化和工具敏感性仍然受到常规内窥镜方法的限制,尤其是较大的血肿体积(> 50 mL)。在本文中,我们描述了Aspihre的发展(脑部出血机器人疏散的手术平台),这是有史以来的第一个同心管机器人,该机器人使用现成的塑料管来进行MR引导ICH撤离,改善工具敏感性和程序可视化。机器人运动学模型是基于基于校准的方法和试管力学建模开发的,使模型可以考虑可变曲率和扭转偏转。使用可变增益PID算法控制旋转精度为0.317 +/- 0.3度。硬件和理论模型在一系列系统的基准和MRI实验中进行了验证,导致1.39 +\ -0.54 mm的管尖的位置精度。验证靶向准确性后,在MR引导的幻影凝块疏散实验中测试了机器人的疏散功效。该机器人能够在5分钟内撤离最初38.36 mL的凝块,使残留血肿为8.14 mL,远低于15 mL指南,表明良好的后疏散临床结果。
translated by 谷歌翻译